| No | Name | Meaning | Original Data Name | Short Data Name | Analytic Type | Data Type | Unit Of Measure | Variable Type | Description / Comments |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Row Number | Unique Row# | RowNumber | N/A | N/A | Integer | Serial Number | int64 | DROP Column: Unique values |
| 2 | Customer Id | Unique Customer# | CustomerId | N/A | N/A | Integer | Serial Number | int64 | DROP Column: Unique values |
| 3 | Surname | Cust.Last Name | Surname | N/A | N/A | Alphabetic | N/A | object | DROP Column: Not Useful |
| 4 | Credit Score | Customer's Credit Score | CreditScore | crscr | Quantitative | Integer | N/A | int64 | Predictor Column |
| 5 | Geography | Customer's Country | Geography | geo | Qualitative | Alphabetic | N/A | object | Predictor Column |
| 6 | Gender | Cust.Gender Male/Female | Gender | sex | Qualitative | Alphabetic | N/A | object | Predictor Column |
| 7 | Age | Customer's Age | Age | age | Quantitative | Integer | Year | int64 | Predictor Column |
| 8 | Tenure | Cust's Years with Bank | Tenure | tenure | Quantitative | Integer | Year | int64 | Predictor Column |
| 9 | Balance | Cust's Bank A/c Balance | Balance | bal | Quantitative | Float | Currency | float64 | Predictor Column |
| 10 | Num Of Products | Products Used by Cust | NumOfProducts | prods | Quantitative | Integer | Count | int64 | Predictor Column |
| 11 | Has Credit Card | Has Credit Card? Yes/No (1,0) | HasCrCard | cards | Qualitative | Integer | N/A | int64 | Predictor Column |
| 12 | Is Active Member | Cust. Active? Yes/No (1,0) | IsActiveMember | active | Qualitative | Integer | N/A | int64 | Predictor Column |
| 13 | Estimated Salary | Cust's Approx. Salary | EstimatedSalary | salary | Quantitative | Float | Currency | float64 | Predictor Column |
| 14 | Exited | Cust. Exited? Yes/No (1,0) | Exited | exited | Qualitative | Integer | N/A | int64 | PREDICTED / TARGET Column |
| Pts | Criteria | Pts | Criteria | Pts | Criteria | ||
|---|---|---|---|---|---|---|---|
| 05 | Feature Elimination | 05 | Data Split into Train & Test Sets | 20 | Build Model, identify Points to Improve & Implement (Optimization) | ||
| 05 | Identify Features and Target Variable | 10 | Normalization of Data | 10 | Predict Output for Input data using Threshold = 0.5 | ||
| 05 | Evaluate model Performance using Confusion Matrix & Accuracy Score |
D1. Feature Elimination : 5 Marks
D2. Bivariate : 5 Marks
D3. Data Split : 5 Marks
D4. Normalization : 10 Marks
D5. Modelling : 20 Marks
D6. Prediction at 0.5 Threshold : 10 Marks
D7. Model Performance Evaluation : 5 Marks
Load Libraries : ⬇
# v====== Standard Libraries Begin ======v #
# from google.colab import drive
# drive.mount('/content/drive')
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt # Data visualization for Ploting
import matplotlib.image as mpimg # To handle plot images, diskfile save, retrieve, render/display
from matplotlib import cm # Color Maps
from mpl_toolkits.mplot3d import Axes3D # MSB: For 3D plots: 5 USL KMeans Case Study Mr.Tom
%matplotlib inline
import pandas as pd # to handle data in form of rows and columns
import pandas_profiling
from pandas import ExcelWriter # Outputs a Excel disk file
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras import initializers
from tensorflow.keras import optimizers
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from category_encoders import OrdinalEncoder
from scipy.stats import zscore, pearsonr, randint as sp_randint # For LinReg
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler, PolynomialFeatures, binarize, LabelEncoder, OneHotEncoder # M.Rao
from sklearn import metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score # LinRegr
from sklearn.metrics import confusion_matrix, recall_score, precision_score, accuracy_score, \
f1_score, roc_curve, roc_auc_score, classification_report, auc # LogRegr
# For Linear Dimensionality (Cols/Attributes) Reduction to a Lower dimensional space (eg: reduce 15 cols to 2 cols):
from sklearn.decomposition import PCA # 5 UL : "Principal Component Analysis" for "Singular Value Decomposition" (SVD)
from sklearn.pipeline import Pipeline, make_pipeline # M.Rao
from yellowbrick.regressor import ResidualsPlot
from yellowbrick.classifier import ClassificationReport, ROCAUC
# Utilities:
# "Aalmond" library is self created by myself (Manoj S Bhave) and is made available publically on PyPi.org which provides 3 Functions:
# 1. vitalStats(df) : Extends df.describe() output to include outliers, -ve, zero, nulls, uniques values, modes, skewness, etc.
# 2. showOutL(df) : Detects, Displays and Imputes outlier values based on IQR method.
# 3. showdfQ(df) : Displays dataframe rows from a variety of sections of the df like mid df, mid q1/q3 of df, head, tail, random etc.
import Aalmond.Aalmond as aa
# Multiple output displays per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import Image, Markdown
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:98% !important; }</style>")) # Increase cell width
# ===== Settings =====
pd.options.display.float_format = '{:,.2f}'.format # Remove scientific notations to display numbers with 2 decimals
pd.set_option('display.max_columns', 100) # Max df cols to display set to 100.
pd.set_option('display.max_rows', 50) # Max df rows to display set to 50.
# pd.set_option('display.max_rows', tdf.shape[0]+1) # just one row more than the total rows in df
print('TensorFlow Version:', tf.__version__)
print('Keras Version:', keras.__version__)
# ====== Standard Libraries End ======^ #
Functions: Define Custom Utilities⬇
def LossAccPlot(history):
"""
Custom Local Function: For Deep Learning Neural Network Project:
Plots 2 Curves Side by Side for Train & Validation Datasets (Color Coded):
1. Loss over Epochs.
2. Accuracy over Epochs.
"""
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.plot(np.array(history.history['loss']) * 100)
plt.plot(np.array(history.history['val_loss']) * 100)
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Train', 'Validation'])
plt.title('Loss Over Epochs')
plt.subplot(1, 2, 2)
plt.plot(np.array(history.history['accuracy']) * 100)
plt.plot(np.array(history.history['val_accuracy']) * 100)
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Train', 'Validation'])
plt.title('Accuracy Over Epochs')
plt.tight_layout()
plt.show();
# Read & Load the input Datafile into Dataset frame: Bank Customer DataFrame:
cdf = pd.read_csv('bank.csv')
cdf
# Housekeeping: Incremental DF Data Backup as of now:
Markdown("### Incremental DF Data Backup 0")
cdf0 = cdf.copy() # Original Df
cdf.to_csv('cdf0.csv') # Also export as .csv file to disk
# Verify backup copy
! ls -l cdf*
cdf0.shape, type(cdf0)
cdf0.sample(6)
# DROP column 'RowNumber' (Serial Number), 'CustomerId' (Customer ID), 'Surname' (Customer Last Name):
# These cols do not provide any Predictive or Analytical value to the Model Training / Learning process:
cdf.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1, inplace = True)
cdf.head()
# Rename column names for convenience and/or meaningfulness:
cdf.head(3)
cdf.rename(columns={ 'CreditScore' : 'crscr' , 'Geography' : 'geo' , 'Gender' : 'sex' , 'Age' : 'age' , 'Tenure' : 'tenure' ,
'Balance' : 'bal', 'NumOfProducts' : 'prods', 'HasCrCard' : 'cards', 'IsActiveMember' : 'active',
'EstimatedSalary' : 'salary', 'Exited' : 'exited' },
inplace=True, errors='raise')
cdf.head(3)
# Housekeeping: Incremental DF Data Backup as of now:
Markdown("### Incremental DF Data Backup 1")
cdf1 = cdf.copy() # DF Modified: Changed col names
cdf.to_csv('cdf1.csv') # Also export as .csv file to disk
# Verify backup copy
! ls -l cdf*
cdf1.shape, type(cdf1)
cdf1.sample(6)
### Housekeeping: Incremental Jupyter Code File Backup 1 as of now ^^^
Markdown("### Incremental Jupyter Notebook Code File Backup 1")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 1.ipynb"
! ls -l Project*.ipynb
# profile = pandas_profiling.ProfileReport(cdf)
# profile
pandas_profiling.ProfileReport(cdf)
# Label Encode the two Categorical Columns: 'geo' & 'sex':
cdf.sex.value_counts()
cdf['sex'] = cdf['sex'].replace(['Female', 'Male'], [0, 1])
cdf.sex.value_counts()
# Label Encode the two Categorical Columns: 'geo' & 'sex':
cdf.geo.value_counts()
cdf['geo'] = cdf['geo'].replace(['France', 'Germany', 'Spain'], [1, 2, 3])
cdf.geo.value_counts()
# Housekeeping: Incremental DF Data Backup as of now:
Markdown("### Incremental DF Data Backup 2")
cdf2 = cdf.copy() # DF Modified: Label encoded cols 'geo', 'sex'. Datatype got changed from 'object' to 'int64'
cdf.to_csv('cdf2.csv') # Also export as .csv file to disk
# Verify backup copy
! ls -l cdf*
cdf2.shape, type(cdf2)
cdf2.sample(5)
# Basic Vital Stats for the data: Using custom function(s) from self created public lib "Aalmond" version 0.1:
# This library / module "Aalmond" is self created (Manoj S Bhave) and is published on pypi.org for public use under MIT License:
# For more info and help shift+tab on the function name:
aa.vitalStats(cdf, dcols=True, srows='m0')
Observations from the Profile Report df stats above:
| Degree | Lower | Upper | Type | Comments |
|---|---|---|---|---|
| High | < −1 | > +1 | Asymmetric | Outside of Lower & Upper |
| Moderate | −1.0 & −0.5 | +0.5 & +1.0 | Asymmetric | Within Lower OR Upper |
| Low | −0.5 | +0.5 | Asymmetric | Within Lower AND Upper |
| Very Low | −0.25 | +0.25 | Symmetric | Within Lower AND Upper |
| No Skew | 0.0 or | near +0.0- | Symmetric | Almost zero (+ or -) |
ACTION: Items From The Above Observations: Outliers in the four columns need not be imputed. Binning, Scaling, Categorization needs to be done instead, which will also reduce skweness. Target col 'exited' (binary cat.col.) should not be imputed because all values will become zero.
# Custom Function showOutL() Detects, Display and/or Imputes Outliers by IQR method.
# Custom library / module "Aalmond" Written by self (Manoj S Bhave) and is published on PyPi, Python Libs, etc...
# For more details use Shift+Tab on this function below:
# Use the Custom Function showOutL() in "Display Only" mode (NO Impute):
aa.showOutL(cdf) # list Outliers
# Box & Whisker plots to check for Outliers:
cdf.plot(kind='box', subplots=True, layout=(4,4), fontsize=8, figsize=(14,14))
plt.show();
cdf.corr()
sns.heatmap(cdf.corr())
sns.pairplot(cdf, hue='exited')
# Do One Hot Encoding for Category Columns: 'geo', 'prods'
# All other cols are either binary (0,1) cat.cols. OR continious variable (to be Scaled/Normalized)
cdf = pd.get_dummies(cdf, columns=['geo', 'prods'])
cdf
# Housekeeping: Incremental DF Data Backup as of now:
Markdown("### Incremental DF Data Backup 3")
cdf3 = cdf.copy() # DF Modified: One Hot encoded cols 'geo', 'prods'
cdf.to_csv('cdf3.csv') # Also export as .csv file to disk
# Verify backup copy
! ls -l cdf*
cdf3.shape, type(cdf3)
cdf3.sample(5)
### Housekeeping: Incremental Jupyter Code File Backup 2 as of now ^^^
Markdown("### Incremental Jupyter Notebook Code File Backup 2")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 2.ipynb"
! ls -l Project*.ipynb
# Prepare data for split: Create X, y (Predictors, Predicted) datasets for Train & Test:
X = cdf.copy()
y = X.pop('exited')
# Split df data into 2 datasets: Train, Test:
X_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=0.2, random_state=1)
X.shape, y.shape, X_trn.shape, X_tst.shape, y_trn.shape, y_tst.shape
type(X_trn), type(X_tst), type(y_trn), type(y_tst)
# Get X_ dataset column labels:
X_cols = X_trn.columns
X_cols
# Instantiate the scaler / normalizer model & execute it:
mm_sclr = MinMaxScaler()
X_trn = mm_sclr.fit_transform(X_trn)
X_tst = mm_sclr.fit_transform(X_tst)
# Convert X_ dataset back to df from array:
X_trn = pd.DataFrame(X_trn)
X_tst = pd.DataFrame(X_tst)
# Get X_ dataset col names back:
X_trn.columns = X_cols
X_tst.columns = X_cols
# Convert y_ dataset back to df from array:
y_trn = pd.DataFrame(y_trn)
y_tst = pd.DataFrame(y_tst)
# Save Train & Test datasets to disk:
X_trn.to_csv('X_trn.csv')
X_tst.to_csv('X_tst.csv')
y_trn.to_csv('y_trn.csv')
y_tst.to_csv('y_tst.csv')
# Verify saved copies:
! ls -l *trn*
! ls -l *tst*
### Housekeeping: Incremental Jupyter Code File Backup 3 as of now ^^^
Markdown("### Incremental Jupyter Notebook Code File Backup 3")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 3.ipynb"
! ls -l Project*.ipynb
# Create dfs to track Model Params & the corresponding Results:
model_prm = pd.DataFrame(index=['Layers', 'Optimizer', 'Lrn_Rate', 'Batch_Size', 'Epochs', 'Activation_Function', 'Loss_Function'])
model_cmp = pd.DataFrame(index=['Accuracy', 'Recall', 'Precision', 'F1_Score', 'Val_Loss'])
model_prm.T
model_cmp.T
pd.options.display.float_format = '{:,.6f}'.format
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='relu'))
model1.add(Dense( 5, activation='relu'))
model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
model1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=30, epochs=10, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_1'] = ['3 Hidden', 'Adam', 0.001, 30, 10, 'Relu, Sigmoid', 'binary_crossentropy']
model_cmp['Model_1'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
### Housekeeping: Incremental Jupyter Code File Backup 4 as of now ^^^
Markdown("##### Incremental Jupyter Notebook Code File Backup 4")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 4.ipynb"
! ls -l Project*.ipynb
### Housekeeping: Incremental Jupyter Code File Backup 5 as of now ^^^
Markdown("##### Incremental Jupyter Notebook Code File Backup 5")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 5.ipynb"
! ls -l Project*.ipynb
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='relu'))
model1.add(Dense( 5, activation='relu'))
model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
opt = optimizers.Adam(lr = 0.005)
model1.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=30, epochs=10, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_2'] = ['3 Hidden', 'Adam', 0.005, 30, 10, 'Relu, Sigmoid', 'binary_crossentropy']
model_cmp['Model_2'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
### Housekeeping: Incremental Jupyter Code File Backup 6 as of now ^^^
Markdown("##### Incremental Jupyter Notebook Code File Backup 6")
! cp "Project 6 NN Bank Churn Prediction.ipynb" \
"Project 6 NN Bank Churn Prediction 6.ipynb"
! ls -l Project*.ipynb
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='relu'))
# model1.add(Dense( 5, activation='relu'))
model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
opt = optimizers.Adam(lr = 0.001)
model1.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=30, epochs=10, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_3'] = ['2 Hidden', 'Adam', 0.001, 30, 10, 'Relu, Sigmoid', 'binary_crossentropy']
model_cmp['Model_3'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='relu'))
# model1.add(Dense( 5, activation='relu'))
# model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
opt = optimizers.Adam(lr = 0.001)
model1.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=30, epochs=30, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_4'] = ['1 Hidden', 'Adam', 0.001, 30, 30, 'Relu, Sigmoid', 'binary_crossentropy']
model_cmp['Model_4'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='tanh'))
# model1.add(Dense( 5, activation='relu'))
# model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
# opt = optimizers.Adam(lr = 0.001)
opt = optimizers.SGD(learning_rate=0.006)
model1.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=30, epochs=10, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_5'] = ['1 Hidden', 'SGD', 0.006, 30, 10, 'TanH, Sigmoid', 'binary_crossentropy']
model_cmp['Model_5'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
# Instantiate Neural Network Sequential Model:
model1 = Sequential()
# Add layers & Build NN Model:
# Input layer:
model1.add(Dense(32, input_dim=15, activation='relu'))
# Hidden layers:
model1.add(Dense(16, activation='relu'))
model1.add(Dense( 5, activation='relu'))
model1.add(Dense( 3, activation='relu'))
# Dropout layer
# model.add(tf.keras.layers.Dropout(0.5))
# Output layer:
model1.add(Dense(1, activation='sigmoid'))
opt = optimizers.Adam(lr = 0.003)
# opt = optimizers.SGD(learning_rate=0.006)
model1.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
# Execute & Train the Model:
history = model1.fit(X_trn, y_trn, batch_size=60, epochs=10, verbose=1, validation_data=(X_tst, y_tst))
LossAccPlot(history)
# Predict the model at Theshold = 0.5
y_test_preds = np.where(model1.predict(X_tst) > 0.5, 1, 0)
# Get Class Prediction for Confusion Metrics:
Y_pred_cls = model1.predict_classes(X_tst, batch_size=200, verbose=0)
# Get Validation Loss:
val_loss = model1.evaluate( X_tst, y_tst.values)[0] # model1.metrics_names[0] = 'loss' [1] = 'accuracy'
# Compile Model Params & Scores in these dfs for comparision with other models:
model_prm['Model_6'] = ['3 Hidden', 'Adam', 0.003, 60, 10, 'Relu, Sigmoid', 'binary_crossentropy']
model_cmp['Model_6'] = [ accuracy_score( y_tst, y_test_preds),
recall_score( y_tst, y_test_preds),
precision_score( y_tst, y_test_preds),
f1_score( y_tst, y_test_preds),
val_loss
]
# Print Accuracy Score and other Scores:
print('Model Parameters:')
model_prm.T
print('Model Accuracy & Other Scores:')
model_cmp.T
# Print Confusion Matrix:
print('Confusion Matrix:')
confusion_matrix(y_tst.values, Y_pred_cls)
# Save Model Param & Score dfs to disk:
model_prm.to_csv('model_prm.csv')
model_cmp.to_csv('model_cmp.csv')
# Verify saved copies:
! ls -l model*
From the Stats df above & from the actual "model.fit" previous runs for indiviual Models above :
Model 4 Has The Best Scores For:
# Highlight and Re Display Stats for ONLY the Specific Best Model from the model_prm & model_cmp dfs:
Markdown('* **Best Model 4 Parameters:**')
model_prm.iloc[:, 3:4].T
Markdown('* **Best Model 4 Accuracy Score & Stats:**')
model_cmp.iloc[:, 3:4].T
Markdown('**<u>Best Model 4 Confusion Matrix</u>:** From the Model 4 "model.fit" Run:')
# Copy & Paste from the previous Output Cell # 709 above for model_4 run:
# array([[1500, 85],
# [ 183, 232]])
# Recreate Confusion Matrix data df from previous model_4 output cell # 709:
cm = pd.DataFrame( [ [1500, 85], [183, 232] ] )
g = sns.heatmap(cm, annot=True, fmt='d')
print('Accuracy:', ((cm[0][0]+cm[1][1])*100)/(len(y_tst)), '% of Test Data was Classified Correctly!')